5 research outputs found

    Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling

    Get PDF
    Understanding the performance of data-parallel workloads when resource-constrained has significant practical importance but unfortunately has received only limited attention. This paper identifies, quantifies and demonstrates memory elasticity, an intrinsic property of data-parallel tasks. Memory elasticity allows tasks to run with significantly less memory that they would ideally want while only paying a moderate performance penalty. For example, we find that given as little as 10% of ideal memory, PageRank and NutchIndexing Hadoop reducers become only 1.2x/1.75x and 1.08x slower. We show that memory elasticity is prevalent in the Hadoop, Spark, Tez and Flink frameworks. We also show that memory elasticity is predictable in nature by building simple models for Hadoop and extending them to Tez and Spark. To demonstrate the potential benefits of leveraging memory elasticity, this paper further explores its application to cluster scheduling. In this setting, we observe that the resource vs. time trade-off enabled by memory elasticity becomes a task queuing time vs task runtime trade-off. Tasks may complete faster when scheduled with less memory because their waiting time is reduced. We show that a scheduler can turn this task-level trade-off into improved job completion time and cluster-wide memory utilization. We have integrated memory elasticity into Apache YARN. We show gains of up to 60% in average job completion time on a 50-node Hadoop cluster. Extensive simulations show similar improvements over a large number of scenarios

    Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

    Get PDF
    While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently, there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications in various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances

    Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

    Get PDF
    While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications for various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances

    Can silhouette execution mitigate virtual machine boot storms?

    No full text
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 107-108).Server virtualization enables data centers to run many VMs on individual hosts - this reduces costs, simplifies administration and facilitates management. Improvement in hardware and virtualization technology, coupled with the use of virtualization for desktop machines with modest steady-state resource utilization, is expected to allow individual hosts to run thousands of VMs at the same time. Such high VM densities per host would allow data centers to reap unprecedented cost-savings in the future. Unfortunately, unusually high CPU and memory pressure generated when many VMs boot up concurrently can cripple hosts that can otherwise run many VMs. Over provisioning hardware to avoid prohibitively high boot latencies that result from these - often daily - boot storms is clearly expensive. The aim of this thesis is to investigate whether a hypervisor could theoretically exploit the overlap in the instruction streams of concurrently booting VMs to reduce CPU pressure in boot storms. This idea, which we name silhouette execution, would allow hypervisors to use the CPU in a scalable way, much like transparent page sharing allows a hypervisor to use its limited memory in a scalable fashion. To evaluate silhouette execution, we studied user-space instruction streams from a few Linux services using dynamic instrumentation. We statistically profiled the extent of nondeterminism in program execution, and compiled the reasons behind any execution differences. Though there is significant overlap in the user-mode instruction streams of Linux services, our simple simulations show that silhouette execution would increase CPU pressure by 13% for 100 VMs and 6% for 1000 VMs. To remedy this, we present a few strategies for reducing synthetic differences in execution in user-space programs. Our simulations show that silhouette execution can reduce CPU pressure on a host by a factor of 8x for 100 VMs and a factor of 19x for 1000 VMs once these strategies are used. We believe that the insights provided in this thesis on controlling execution differences in concurrently booting VMs via dynamic instrumentation are a prelude to a successful future implementation of silhouette execution.by Syed Aunn Hasan Raza.M.Eng

    GPU-accelerated data management under the test of time

    No full text
    GPUs are becoming increasingly popular in large scale data center installations due to their strong, embarrassingly parallel, processing capabilities. Data management systems are riding the wave by using GPUs to accelerate query execution, mainly for analytical workloads. However, this acceleration comes at the price of a slow interconnect which imposes strong restrictions in bandwidth and latency when bringing data from the main memory to the GPU for processing. The related research in data management systems mostly relies on late materialization and data sharing to mitigate the overheads introduced by slow interconnects even in the standard CPU processing case. Finally, workload trends move beyond analytical to fresh data processing, typically referred to as Hybrid Transactional and Analytical Processing (HTAP). Therefore, we experience an evolution in three different axes: interconnect technology, GPU architecture, and workload characteristics. In this paper, we break the evolution of the technological landscape into steps and we study the applicability and performance of late materialization and data sharing in each one of them. We demonstrate that the standard PCIe interconnect substantially limits the performance of state-of-the-art GPUs and we propose a hybrid materialization approach which combines eager with lazy data transfers. Further, we show that the wide gap between GPU and PCIe throughput can be bridged through efficient data sharing techniques. Finally, we provide an H2TAP system design which removes software-level interference and we show that the interference in the memory bus is minimal, allowing data transfer optimizations as in OLAP workloads
    corecore